Very High Resolution Projections over Italy under different CMIP5 IPCC scenarios

This paper introduces VHR-PRO_IT (Very High-Resolution PROjections for ITaly), an open access hourly climate projection with a resolution of ≃2.2 km (i.e., Convection Permitting Scale) up to 2050, covering the Italian peninsula and some neighbouring areas. VHR-PRO_IT is produced within the Highlander project (https://highlanderproject.eu/) by dynamically downscaling the Italy8km-CM climate projection (spatial resolution ≃8 km; output frequency = 6 h; driven CMIP5 GCM = CMCC-CM) with the Regional Climate Model COSMO-CLM under the IPCC RCP4.5 and RCP8.5 scenarios. It covers the 60-year period 1989–2050. VHR-PRO_IT is intended for research purposes in the field of climate studies. For example, it may be included in the ongoing activities to clarify the added value of running climate simulation at the convection-permitting scale.

(≃31 km) to a resolution of ≃2.2 km using the regional climate model COSMO-CLM 17 . VHR-REA_IT has been created to bring the potential of ERA5 to the convection-permitting scale. Its performance has been thoroughly evaluated against gridded observations to quantify its spatial and temporal added value for temperature and precipitation. Reder et al. 9 adopted the same workflow, model, and configuration to produce ERA5@2 km, a high-resolution dataset for 20 European cities which provides data to estimate expected hourly precipitation at fixed return periods, adopted as input for pluvial flooding risk analysis 18 . ERA5@2 km has been evaluated against a set of observational datasets (comparable in spatial and temporal resolution), providing a more precise understanding of its added value in the localisation and magnitude of precipitation events at the urban scale, confirming the gain of CP-RCMs for the representation of extremes.
This work presents VHR-PRO_IT (Very High-Resolution PROjections for ITaly), an open access convective-scale climate projection up to 2050 covering the Italian peninsula and some neighbouring areas. VHR-PRO_IT is obtained within the Highlander project as a follow-up of VHR-REA_IT. It is produced by dynamically downscaling the Italy8km-CM climate projection 19,20 (spatial resolution ≃8 km; output frequency = 6 h; driven CMIP5 GCM = CMCC-CM 21 ) to the same spatial (≃2.2 km) and temporal (hourly) resolution of VHR-REA_IT. Its global forcing is the historical experiment driven by the observed natural and anthropogenic atmospheric composition for 1989-2005 and the RCP4.5 and RCP8.5 greenhouse gas concentration trajectories 22 for 2006-2050.
Although it accounts for two greenhouse gas concentration trajectories, VHR-PRO_IT represents a single model projection of future climate change bypassing the effects of internal variability 23,24 whose relevance is important when looked at geographically, especially for precipitation. It is mainly intended for research purposes in the field of climate studies. Its use in downstream applications (e.g. to support decision-making and adaptation) is not recommended due to the unavailability of a multi-model long-time CP-RCMs ensemble over Italy, allowing an adequate evaluation of uncertainties and robustness of its findings. Furthermore, applying bias-correction techniques [25][26][27] to post-process data before their use in climate change impact or adaptation studies is strongly recommended to correct systematic biases against observations. Finally, it is worth noting that RCP4.5 can be considered likely under current policies. At the same time, RCP8.5, often called business as usual, should be clearly labelled as the unlikely worst case 28 , especially for climate impact studies. However, it remains relevant for research, as it was produced within the CORDEX initiative.
In any case, VHR-PRO_IT may represent a valuable dataset to investigate the added value of running CP-RCMs. The assessment of this added value is a goal of climate studies in recent years, as evidenced by different flagship activities (e.g. CORDEX FPS convection and EUCP) on such a topic. VHR-PRO_IT is expected to support this goal by understanding whether there is an added value in adopting CP-RCM climate projections in the different Italian contexts and supporting the development of cutting-edge approaches to exploit CP-RCM outputs. These issues are precisely the main ambitions of the Highlander project, where this climate projection was founded.

Methods
Regional Climate Model COSMO-CLM. COSMO-CLM 17 is a non-hydrostatic limited-area model designed for climate simulations from the meso-β (~20-200 km) to meso-γ (~2-20 km) scale. It exploits finite difference methods to solve the governing equations of fully compressible fluid dynamics on a structured grid using finite difference methods.
Horizontal advection is calculated with a fifth-order upwind scheme, while vertical advection is calculated using an implicit Crank-Nicholson scheme 29 . Time integration is performed with a third-order split-explicit Runge-Kutta discretisation 30 . Cloud microphysics is modelled with a single-moment scheme 31 using five hydrometeors (cloud water, rain, ice crystals, snow and graupel). The radiation scheme is based on a two-flow approach described by Ritter and Geleyn 32 . The turbulent fluxes within the planetary boundary layer are parameterised using a scheme based on turbulent kinetic energy (TKE) 33,34 . The Tiedtke mass-flux scheme 35 is the default COSMO-CLM convective parameterisation. It is a mass-flux closure approach used to reproduce changes in the vertical structure of the atmosphere due to deep, mid-level and shallow convection. In the convection resolution setup (i.e. the one used for VHR-PRO_IT), only the surface convection part of the scheme is active, while the scheme for deeper clouds remains deactivated.
Soil moisture is modelled using the soil model TERRA_ML 36 with a formulation for water runoff depending on orography. In addition, COSMO-CLM allows turning on the module TERRA-URB 37 to simulate urban areas properly. TERRA-URB is a bulk scheme relying on a tile approach to discern for each grid cell between urban canopy and natural land cover and compute adjusted soil and water fluxes considering urban environment features. Such a module has been activated to produce VHR-PRO_IT data exploiting the very high horizontal resolution able to more detail urban covers.
Climate simulation setup. The downscaling activity has been performed using the COSMO-CLM It has also been adopted by several institutes acting in the Climate Limited-area Modelling-Community as a reference for climate mode experiments in the frame of the CORDEX-FPS convection. COSMO-DE setup also is the same configuration employed for the downscaling of ERA5 to obtain VHR-REA_IT. Table 1 summarises the main features of the model configuration.
has been interpolated to the rotated latitude-longitude grid of the COSMO-CLM through the INT2LM program. This tool provides the initial and boundary data necessary to run the COSMO-CLM. Generally, data from the global model GME (i.e., the icosahedral grid point model of DWD), the Integrated Forecasting System (IFS, i.e., the spectral model of ECMWF) and the regional COSMO-CLM itself, as in this case of ITALY8km-CM, can be processed directly, avoiding the pre-processing phase. Finally, a long-term climate simulation has been performed by setting an automatic restart procedure to prevent interruptions of simulation due to the maximum walltime of the SLURM (Simple Linux Utility for Resource Management) partition.
INT2LM and the COSMO-CLM are implemented for distributed memory parallel computers using the Message Passing Interface (MPI). A Makefile is provided with the source codes, where the compiler call, the options and the necessary libraries can be specified.
The Centro euro-Mediterraneo sui Cambiamenti Climatici (CMCC) Foundation performs the long-term climate simulation on the supercomputer cluster GALILEO100 (https://wiki.u-gov.it/confluence/display/ SCAIUS/UG3.3%3A+GALILEO100+UserGuide) of the Consorzio Interuniversitario del Nord-Est per il Calcolo Automatico (CINECA). CINECA, as coordinator of the Highlander project, designed, set up and made all the necessary HPC and CLOUD infrastructure available. The HPC is equipped with 554 computing nodes with 48 cores. Each of them contains 2 × CPU × 86 Intel Xeon Platinum 8276-8276 L (24 cores at 2.4 GHz). All used computing nodes have 384 GB of memory.
The long-term simulation has been performed by exploiting 54 nodes, corresponding to 2484 cores, taking approximately 43 hours per simulation year. The best results are obtained by employing 46 of the 48 cores present in each node. About 12 million hours of HPC resources were used for the long-term simulation. A large amount of data (i.e., ≃16.5 TB of output data and greater than ≃53 TB of forcing data, including the 3-dimensional boundary data needed for the downscaling) was produced. Table 2 provides a general overview of the main features of the VHR-PRO_IT dataset. The dataset contains hourly data on a rotated grid (≃2.2 km, irregular/rotated pole grid) with temporal coverage from 01/01/1989 00:00 to 31/12/2050 23:00 (i.e., 1989-2005 for the historical period; 2006-2050 for the future period). These data are delivered in NetCDF format (dimensions = time, longitude, latitude, single vertical level), generally on single levels (i.e., 2 or 10 meters from the surface depending on the selected variables), except for soil moisture available at seven soil levels (i.e., depth = 1, 3, 9, 27, 81, 243, 729 cm from the surface). www.nature.com/scientificdata www.nature.com/scientificdata/ The reference coordinate system is WGS84 (EPSG 4326). The file naming of the output variables is structured, following as much possible the "CORDEX approach", as VariableName_DatasetName&Resolution_ GCMModelName_CMIP5ExperimentName_RCMModelName&VersionID_Frequency_DDSid.nc Table 3 reports the list of the output variables (short and long name), measure units, a description of the meteorological fields, and the corresponding short name variable from the CMIP5 standard CORDEX.

technical Validation
This Section investigates the robustness of the VHR-PRO_IT dataset in two different aspects: 1. model performance: statistical analysis for the historical period produced by comparing model data to reference observations to measure the model deviation from observations; 2. model consistency: statistical analysis for the future period developed by comparing model data to other climate projections to measure the model convergence towards a similar climate signal.
This analysis is performed at the daily scale for total precipitation and 2m-temperature, and at the hourly scale for total precipitation.
In particular, the analysis at the daily scale considers: • as reference observations, the daily gridded dataset SCIA-ISPRA of the Italian Environmental Protection Agency (ISPRA), based on the interpolation of data from local weather stations, and available at ~5 km for temperature and ~ 10 km for precipitation;  www.nature.com/scientificdata www.nature.com/scientificdata/ • as additional climate projections, the Italy8km-CM climate projection 19 and seventeen GCM + RCM (see Table 4) from the Euro-CORDEX initiative 38 at ~12 km grid spacing under the IPCC RCP4.5 and RCP8.5 scenarios.
Conversely, the analysis at the hourly scale focuses on the summer season (June-July-August), as convective events and processes dominate this season for the region of interest. It considers: • as reference observations, the hourly gridded dataset GRIPHO 39 relying on the interpolation of data from local weather stations at ~ 10 km; • as qualitative additional climate projections, the finding of the 10-year multi-model CP-RCMs climate projection ensembles at a horizontal grid spacing of ∼3 km provided by Pichelli et al. 12 over the greater Alpine region under RCP8.5 scenario.
For both temporal scales, data are extracted and processed, separating Italy (see Fig. 2) into Northern Italy, Central Italy and Southern Italy, as done in Bucchignani et al. 19 .
Analysis at the daily scale: model performance. The model performance at the daily scale (see Fig. 3) is investigated over northern (Fig. 3a), central (Fig. 3b) and southern (Fig. 3c) Italy by assuming as indices the multi-annual average of 2m-temperature and total precipitation over 1989-2005. These indices have been first scaled to the observations and then represented in the plot normalised total precipitation -normalised temperature. The intention is to measure the distance between the pairs of values of each climate model from the target (1,1), representing the normalised observations. The lower the distance, the lower the bias of each model against observations and, thus, the higher its performance.
VHR-PRO_IT highlights a reduced simulation bias compared to the observations, demonstrating a satisfactory model performance over the historical period. Specifically, it outperforms Italy8km-CM reducing the bias for 2m-temperature and total precipitation (except over the Nothern Italy for the precipitation bias). Moreover, VHR-PRO_IT falls within the envelope of the Euro-CORDEX models, highlighting appropriate reliability. It returns a cold temperature bias w.r.t. the target, in line with all Euro-CORDEX members. On the other hand, it provides a dry precipitation bias in central and southern Italy and a wet precipitation bias in northern Italy. The sign of the precipitation bias agrees with the Euro-CORDEX members in the north and with more uncertainty in southern Italy. However, it is the opposite in central Italy, where most Euro-CORDEX models return a wet precipitation bias.  www.nature.com/scientificdata www.nature.com/scientificdata/ Analysis at the daily scale: model consistency. The model consistency at the daily scale (see Fig. 4) is investigated over northern (Fig. 4a,b), central (Fig. 4c,d) and southern (Fig. 4e,f) Italy under the RCP4.5 (Fig. 4a-e) and RCP8.5 (Fig. 4b-f) scenarios by assuming as indices the expected climate changes (2021-2050 vs 1989-2018) of the multi-annual average of 2m-temperature and total precipitation. Data over 1989-2018 are derived by combining historical data (1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005) with data for the period 2006-2018 from the RCP8.5 scenario. The indices have been first scaled to the ensemble mean (considered as the reference in terms of convergence toward a consistent climate signal) and then represented in the plot normalised total precipitation changes -normalised temperature changes. The aim is to retrieve the distance between the pairs of values of each model from the ensemble mean, i.e., the target point (1,1). The lower the distance, the higher the model consistency.

CNRM-CERFACS-CM5
CLMcom-CLM-CCLM4-8-17 Table 4. List of climate projections selected from the EURO-CORDEX initiative. www.nature.com/scientificdata www.nature.com/scientificdata/ VHR-PRO_IT appears remarkably consistent with the Italy8km-CM projections and the EM of Euro-CORDEX models in terms of climate changes for 2m-temperature and total precipitation, falling within its envelope. The main differences are in northern Italy under RCP4.5 and central Italy under RCP8.5 and are mainly due to changes in total precipitation. Interestingly, the Euro-CORDEX members tend to cluster according  , (b) central and (c) southern Italy. Each plot shows data as pairs of normalised precipitation bias against normalised temperature bias for VHR-PRO_IT, Italy8km-CM, and the Euro-CORDEX members with their Ensemble Mean (EM). Point (1,1) represents the observations as the target. Euro-CORDEX models are grouped according to GCM (same colours for each GCM) and RCM (same symbol for each RCM).
www.nature.com/scientificdata www.nature.com/scientificdata/ to the GCM rather than the RCM, except for G3, featuring a divergent behaviour of R4 with respect to R7. In such clustering, the most consistent GCM turns out to be G1.
Analysis at the hourly scale: model performance. The model performance at the hourly scale (see Fig. 5) over the historical period is assessed as in Pichelli et al. 12 . The indices considered are mean summer wet-hour intensity (>0.1 mm/h, see Fig. 5a,b), wet-hour frequency (see Fig. 5c,d) and heavy hourly precipitation (i.e., the 99.9 th percentile of all events, see Fig. 5e,f). They are computed for VHR-PRO_IT over 1989-2005 ( Fig. 5b-f) and GRIPHO over 2001-2010 (same period considered in Pichelli et al. 12 for this dataset, see Fig. 5a-e) and qualitatively compared to the findings obtained by Pichelli et al. 12 over the greater Alpine region (covering northern, central, and a small part of southern Italy) for the time slice 1996-2005. Although different periods are considered, this comparison aims to qualitatively assess the potential bias of VHR-PRO_IT against experiments with similar features assuming a shared reference (in this case, GRIPHO).
VHR-PRO_IT generally reproduces the summer hourly indices for the historical experiment well. Indeed, it agrees with observations considering the magnitude and spatial correlation, especially over complex orographic areas such as the Alpine region. Such an agreement aligns with the findings of the multi-model CP-RCMs climate projection ensembles reported in Pichelli et al. 12 . Table 5 shows the biases of the analysed indices against GRIPHO in Northern, Central and Southern Italy. Biases are moderate for the areas examined (+1%/−13% for hourly precipitation intensity, −5%/−25% for heavy hourly precipitation). Compared to the same analyses conducted by Pichelli et al. 12 over Italy (in an analysis domain incorporating northern, central and part of southern Italy), the frequency bias is practically the same. At the same time, a divergence occurs in the sign of the bias for the other two indices (i.e., intensity and heavy precipitation), which is mainly negative in VHR-PRO_IT and mainly positive in the multi-model CP-RCMs climate projection ensembles of Pichelli et al. 12 . In any case, in both experiences, the biases are significantly limited.
The change in precipitation intensity is mainly positive in both scenarios, with the magnitude depending on the RCP considered (higher increase for RCP8.5 than RCP4.5). Conversely, frequency changes are negative across Italy, with minimum values over complex orography contexts (e.g. the Alpine region) and reduced for RCP8.5 compared to RCP4.5. This more intense but less frequent event signal is consistent with the Pichelli et al. 12   www.nature.com/scientificdata www.nature.com/scientificdata/

Usage Notes
The dataset is available at CMCC at https://doi.org/10.25424/CMCC-J90A-5P12 40 . Data are stored at the CMCC Supercomputing Centre facilities and integrated into the CMCC Data Delivery System (DDS) (http:// dds.cmcc.it), a unique, consistent, seamless access point for data produced by CMCC. The DDS user interface allows users to easily compile queries related to the dataset, by selecting a variable, a geographical area or location, and a period, and downloading data through the unified DDS API Python client interface. The file naming of the output variables is structured, following as possible the "CORDEX style", as VariableName_ DatasetName&Resolution_GCMModelName_CMIP5ExperimentName_RCMModelName&VersionID_ Frequency_DDSid.nc (i.e., TD_2M_VHR-PRO_IT2km_CMCC-CM_rcp45_CCLM5-0-9_1hr_118902.nc). www.nature.com/scientificdata www.nature.com/scientificdata/ The download of the output variables via Python allows modifications in the file names of the data, according to the user requierements. The data are also available on the Highlander platform (https://highlanderproject.eu/ data), which can be accessed similarly to the DDS service.
The data is in NetCDF format. Other standard interoperable formats (i.e., ESRI grid, GEOTIFF) can be provided on request for a sub-selection of the dataset, whose size will be defined according to the specific user needs and the processing time required. The easiest and fastest way to use NetCDF format is via command/ script-based languages, such as CDO (Climate Data Operators) and NCO (NetCDF Operators), or via Python, Matlab and R. Some commercial GIS packages, such as ArcGIS (version 9.2 onwards), QGIS and IDRISI Taiga, www.nature.com/scientificdata www.nature.com/scientificdata/ allow the reading and processing of data. In addition, viewers, such as Panoply, ncBrowse, ncview and nCDF_ Browser, allow simple data visualisation and map production.
Please cite this manuscript when using the VHR-PRO_IT dataset or part of it. In addition, please contact the corresponding author for any questions, suggestions or collaboration requests regarding the VHR-PRO_IT dataset.

Code availability and data license
Data are protected by copyright. Distribution and communication of the data to the public are not allowed without the Fondazione CMCC's written authorisation. Access to, consultation with, and reproduction of the data, in whole or in part, for personal use or institutional and research purposes is permitted, but not for commercial purposes and not with the intent to distribute, communicate, or make them available to the public. CINECA, through the Highlander platforms, and CMCC are the only institutions with permission to distribute, communicate, or make the data available to the public. Adapting and transforming the data to create derivative works based on them, on the condition that an adequate mention of paternity is recognised through the citation of this paper to provide a link to the dataset Doi (https:// doi.org/10.25424/CMCC-J90A-5P12).
CMCC Foundation submits its data to adequate verification activities. However, CMCC Foundation does not assume responsibility for any inaccuracy or omission in them. CMCC Foundation is not responsible for the data and news published when processed by third parties and for the contents provided by any other site starting from such data. CMCC Foundation does not assume responsibility for any decision based on such data, which remains the sole responsibility of the user, nor for any loss or damage, direct or indirect, that may arise from their use.